12 - 25.7. Regression and Classification with Linear Models (Part 2) [ID:30380]
50 von 103 angezeigt

Now, we understand what's happening. Let's make it less simple. Let's go to the multivariate

case. So, if we're interested in learning functions that depend on more than one aspect,

the house prices were carefully chosen that there's only one input. But you might be interested

in seeing what are the house prices in terms of square feet, the size of the lot, whether

you have, what was it, Berkeley, whether you have a bay view or not, how far you are from

the next BART station and so on and whether there are trees on your lot. Okay, so I have

five factors here which gives me a function that goes from say R5 into the real numbers.

Okay, so everything becomes a little bit more complicated. So we are in the multivariate

case. So what we have is this function here that basically it has to be linear. We're

going to make a weighted sum over our example. So every example now is in my example a five

vector and we have n many of those. So we can just basically write this as a vector

and we can then have this linear hypothesis for it, which we can write as a sum and which

is essentially the same as we did in the univariate case except that we have more arguments here.

Again W0 is somehow special and the whole thing becomes much nicer to write down if

we basically invent a zeroth component to our example which we always set to one. Then

we can write this sum which is kind of a sum of products here plus this kind of up and

down thing. We can write that as a dot product which is essentially take this if we have

what do they call me? I always get confused. You take two vectors and then you transpose

one as a matrix and then you can also use matrix product. All that I'm trying to say

is good you don't have to program sums. You can use a linear algebra package say NumPy

or something like Matlab to do the usual things. So essentially what we want to solve is this

solve. Find the argument, the best W such that the loss of squared errors where the

second where this H hypothesis is actually a dot product. We can minimize that and we're

going to do exactly the same here namely we're just basically giving ourselves an update

equation and here we're getting something that looks almost the same as before except

of course where we had a single product here we have to sum over all the different directions.

But where here we had two kind of forms one linear one constant via our little matrix

trick here we're getting a uniform update equation here. Okay so all of this works well

with linear with gradient descent but we can also do a closed form expression and it turns

out that if you basically interpret the X's our set of examples as a matrix call that

capital X then you can actually solve the equation if you're careful enough with the

linear algebra and get this solution here. X transposed X inverse times X transposed

times the Y's here and that minimizes squared error. Okay so again we have closed forms

which are big sums of course if you really want to compute it and we have gradient descent

in a very simple way. This has been known for quite a while most of this was already

known to Gauss two decades ago. So we have a problem there however remember we had for

linear univariate linear regression we had we didn't have an overfitting problem. We

had this nice convexity result but if we're in the multivariate case then we might have

what you could think of as redundant dimensions. Say we have this house price example again

and maybe one of our arguments is something like moon phase right and let's assume that

the moon phase actually doesn't influence house prices probably reasonable and then

of course we don't have we kind of have these dimensions in which everything is flat where

the derivatives are zero and so on. Okay and there you want to do something special instead

of having saying instead of learning that in this case forget the argument by making

the respective weights zero you might actually have the learning say well and it has to be

a half moon for the optimal hypothesis. Everybody goes out and buys their house at half moon.

Clearly overfitting that's not something we want to do. So what do we do? Think of this

linear regression stuff as a tool chest. You're minimizing wonderful but you can do all the

tools for minimizing either solving analytically or doing gradient descent. Now where we have

the possibility of overfitting what do you do? Well you rummage around in your tool chest

Teil eines Kapitels:
Chapter 25. Learning from Observations

Zugänglich über

Offener Zugang

Dauer

00:18:21 Min

Aufnahmedatum

2021-03-30

Hochgeladen am

2021-03-30 17:06:36

Sprache

en-US

Definition of Multivariate Linear Regression, an analytical solution and its regularization. 

Einbetten
Wordpress FAU Plugin
iFrame
Teilen